3 research outputs found

    Vision Language Models in Autonomous Driving and Intelligent Transportation Systems

    Full text link
    The applications of Vision-Language Models (VLMs) in the fields of Autonomous Driving (AD) and Intelligent Transportation Systems (ITS) have attracted widespread attention due to their outstanding performance and the ability to leverage Large Language Models (LLMs). By integrating language data, the vehicles, and transportation systems are able to deeply understand real-world environments, improving driving safety and efficiency. In this work, we present a comprehensive survey of the advances in language models in this domain, encompassing current models and datasets. Additionally, we explore the potential applications and emerging research directions. Finally, we thoroughly discuss the challenges and research gap. The paper aims to provide researchers with the current work and future trends of VLMs in AD and ITS

    RT-DLO: Real-Time Deformable Linear Objects Instance Segmentation

    Get PDF
    Deformable Linear Objects (DLOs) such as cables, wires, ropes, and elastic tubes are numerously present both in domestic and industrial environments. Unfortunately, robotic systems handling DLOs are rare and have limited capabilities due to the challenging nature of perceiving them. Hence, we propose a novel approach named RT-DLO for real-time instance segmentation of DLOs. First, the DLOs are semantically segmented from the background. Afterward, a novel method to separate the DLO instances is applied. It employs the generation of a graph representation of the scene given the semantic mask where the graph nodes are sampled from the DLOs center-lines whereas the graph edges are selected based on topological reasoning. RT-DLO is experimentally evaluated against both DLO-specific and general-purpose instance segmentation deep learning approaches, achieving overall better performances in terms of accuracy and inference time

    Point Cloud Registration With Object-Centric Alignment

    No full text
    Point cloud registration is a core task in 3D perception, which aims to align two point clouds. Moreover, the registration of point clouds with low overlap represents a harder challenge, where previous methods tend to fail. Recent deep learning-based approaches attempt to overcome this issue by learning to find overlapping regions in the whole scene. However, they still lack robustness and accuracy, and thus might not be suitable for real-world applications. Therefore, we present a novel registration pipeline that focuses on object-level alignment to provide a robust and accurate alignment of point clouds. By extracting and completing the missing points of the object of interest, a rough alignment can be achieved even for point clouds with low overlap captured from widely apart viewpoints. We provide a quantitative and qualitative evaluation on synthetic and real-world data captured with a Kinect v2. The proposed approach outperforms the current the current state-of-the-art methods by more than 29% w.r.t. the registration recall on the introduced synthetic dataset. We show that the overall performance and robustness increases due to the object-level alignment, while the baselines perform poorly as they take the entire scene into account
    corecore